Summarization of Legal Texts with High Cohesion and Automatic Compression Rate

نویسندگان

  • Mi-Young Kim
  • Ying Xu
  • Randy Goebel
چکیده

We describe a method for extractive summarization of legal judgments using our own graph-based summarization algorithm. In contrast to the connected and undirected graphs of previous work, we construct directed and disconnected graphs (a set of connected graphs) for each document, where each connected graph indicates a cluster that shares one topic in a document. Our method automatically chooses the number of representative sentences with coherence for summarization, and we don’t need to provide a priori, the desired compression rate. We also propose our own node/edge-weighting scheme in the graph. Furthermore, we do not depend on expensive hand-crafted linguistic features or resources. Our experimental results show our method outperforms previous clustering-based methods, including those which use TF*IDF-based and centroid-based sentence selection. Our experimental results also show that our method outperforms previous machine learning methods that exploit a variety of linguistic features.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Biogeography-Based Optimization Algorithm for Automatic Extractive Text Summarization

    Given the increasing number of documents, sites, online sources, and the users’ desire to quickly access information, automatic textual summarization has caught the attention of many researchers in this field. Researchers have presented different methods for text summarization as well as a useful summary of those texts including relevant document sentences. This study select...

متن کامل

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

Summarization of Multimodal Information

Information Summarization is one of the key challenges for current and future information systems. In this paper, we will outline a system that comprises modules for summarizing texts and time series to study the link between the two. Summaries of texts are generated using a lexical analysis of cohesion in texts focusing on key sentences that provide cohesion: by implication, these are the sent...

متن کامل

Disunity in Cohesion: How Purpose Affects Methods and Results When AnalyzingLexical Cohesion

Lexical Cohesion is a commonly studied linguistic feature as it is easily identified from the surface of a text. However, the purposes for studying lexical cohesion are varied, and each purpose requires different methods. This study analyzes two short movie review texts for four different research purposes using lexical cohesion: text evaluation, text segmentation, text summarization, and text ...

متن کامل

SIMBA: An Extractive Multi-document Summarization System for Portuguese

This is a proposal for demonstration of simba in PROPOR 2012. simba is an extractive multi-document summarization system that aims at producing generic summaries guided by a compression rate defined by the user. It uses a double-clustering approach to find the relevant information in a set of texts. In addition, simba uses a sentence simplification procedure as a mean to ensure summary compress...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012